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(57) Abstract: Polymerases from 
the Pol I family which are able 
to efficiency use ddNTPs have 
demonstrated a much improved 
performance when used to sequence 
DNA. A number of mutations have 
been made to the gene coding for the 
Pol II family DNA polymerase from 
the archaeooPyrococcus furiosus with 

the aim of improving ddNTP utilisation. "Rational" alternations to amino acids likely to be near the dNTP binding site (based on 
sequence homologies and structural information) did not yield the desired level of selectively for ddNTPs. However, alteration at 
four positions (Q472, A486, L490 and Y497) gave rise to variants which incorporated ddNTPs better than the wild type, allowing 
sequencing reactions to be carried out at lowered ddNTP:dNTP ratios. Wild type Pfu-Pol required a ddNTPtdNTP ratio of 30:1; 
values of 5:1 (Q472H), 1:3 (L490Y), 1:5 (A486Y) and 5:1 (Y497A) were found with the four mutants; A486Y representing a 
150-fold improvement over the wild type. A486, L490 and Y407 are on an a-helix that lines the dNTP binding groove, but the 
side chains of the three amino acids point away from this groove; Q472 is in a loop that connects this a-helix to a second long 
helix. None of the four amino acids can contact the dNTP directly. Therefore, the increased selectively for ddNTPs is likely to 
arise from two factors: 1). small overall changes in conformation that subtly alter the nucleotide triphosphate binding site such 
that ddNTPs become favoured; 2) interference with a conformational change that may be critical both for the polymerisation step 
and discrimination between different nucleotide triphosphates. 



/ 



WO 01/38546 




PCTYUSOO/31830 



IMPROVING DIDEOXYNUCLEOTIDE-TRIPHOSPHATE UTILIZATION 
BY THE HYPER-THERMOPHILIC DNA POLYMERASE FROM THE 
ARCH AEON PYROCOCCUS FURIOSUS 

5 Field of Invention 

The instant disclosure pertains to DNA polymerase mutants from Pyrococcus 
furiosus which exhibit improved dideoxynucleotide utilization. 

Background of Invention 

10 A polymerases constitute a core component of DNA sequencing methods, a 

widespread and important biotechnology, based on chain-termination by 
dideoxynucleotide-triphosphates (ddNTPs), either the ddNTPs themselves or 
fluorescent derivatives. Discrimination between chain-terminating ddNTPs and 
dNTPs plays a key role in DNA sequencing performance. Effective ddNTP 

15 incorporation is associated with a high uniformity of signal intensity in sequencing 
ladders. Furthermore, efficient usage of ddNTPs requires lower concentrations; an 
advantage when fluorescent terminators are used, as large excesses give rise to high 
backgrounds. Bacteriophage T7 DNA polymerase incorporates ddNTPs much more 
efficiently than the enzymes from E. coli and T. aquaticus and, as a consequence, 
20 gives superior sequencing ladders. The molecular basis for discrimination between 
dNTPs and ddNTPs resides in a single amino acid, at an equivalent location: Y526 
(T7); F762 (£. coli)] F667 (J. aquaticus). The T7 mutant, Y526F, shows a much 
reduced ability to use ddNTPs and, consequently, gives poor sequencing ladders. The 
F762Y and F667Y variants of E. coli and T. aquaticus use ddNTPs effectively and 
25 show much improved sequencing properties. The important role of F762, in the E. 
coli polymerase, for deoxynucleotide-triphosphate recognition has been confirmed by 
a more complete kinetic analysis. Mutants of T. aquaticus DNA polymerase which 
have exonuclease activity removed, and contain tyrosine at position 667, have 
excellent sequencing properties and are perhaps the most widely used enzymes for 
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DNA sequencing; these include Thermo-Sequenase™ and Amplitaq FS™ DNA 
polymerases. All three polymerases belong to the Pol-I family (also called family A 
polymerases) and the critical aromatic amino acid is found in a highly conserved 
stretch, the B-motif (also called region III)(Figure 1). Structural data has shown that 
the B-motif amino acids form an a-helix (the O-helix in the case of E. coli) with the 
most conserved amino acids on one side of the helix, forming part of the dNTP 
binding site (Figure 1). The tyrosine/phenylalanine is near the sugar of the dNTP, 
rationalising its critical role in dNTP/ddNTP selection. 

Polymerases with thermal stability are routinely used for DNA sequencing. 
Not only are they generally more robust than mesophilic enzymes but are essential 
for cycle-sequencing protocols, which involve heat-cool cycles. The extreme 
thermostability of polymerases purified from hyper-thermophilic archaea suggest 
these enzymes have potential use in DNA sequencing. However, archaeal 
polymerases often use ddNTPs poorly and, as a result, are generally not as useful in 
DNA sequencing as, for example, Thermo Sequenase™ DNA polymerase. Archaeal 
polymerases belong to the a-family (also called the B-family, or Pol-II family ), a 
different group to the better characterised Pol-I enzymes. However, sequence 
alignment shows that the Pol-a family also has a B-motif, even though it cannot be 
exactly aligned with that of Pol-I (8, 9) (Figure 1), and there is no exact counterpart 
to the aromatic amino acid critical for ddNTP/dNTP selection in the Pol-I family. 
Nevertheless, mutations in the B-motif of a-polymerases influence dNTP binding, 
suggesting a role in deoxynucleoside-triphosphate recognition. Recently, crystal 
structures of two Pol-a members, bacteriophage RB69 gp43 and Thermococcus 
gorgonai'ius (Tgo), have been published. The B-motif amino acids form an a-helix 
(the P-helix with both enzymes) which, as suggested by sequence alignments, is 
similar, but not structurally identical to the corresponding a-helix in the Pol-I family 
(Figure 1 ). Although both structures lack bound nucleic acid it was possible to model 
primer-template and dNTP into RB69. The B-region was located near both 
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primer/template and dNTP but did not appear to provide an amino acid that binds 
near the sugar of the dNTP. Rather, this might be supplied by tyrosine 416, an amino 
acid from another part of the polymerase, which packs under the sugar ring of the 
dNTP. 

Sequence and structural comparisons of Pol-I and Pol-a members indicate 
that homologous regions are used for dNTP binding and recognition. However, the 
exact details of the interaction with dNTPs, and hence discrimination between 
dNTPs and ddNTPs, differs between the two classes. Therefore, the simple 
tyrosine/phenylalanine switch, so successful in converting T. aquaticus to a good 
sequencing polymerase, is unlikely to be possible with archaeal polymerases. 
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Summary of Invention 

The instant invention comprises Pyrococcus furiosus polymerase mutants 
that recognise ddNTPS a factor of up to 150 fold better and are superior for cycle- 
5 sequencing protocols _than the wild type enzyme. Methods for the isolation and 
characterisation of these mutants are discussed. The mutants exhibit superior 
thermal stability compared to other thermostable DNA polymerases, exhibiting 
stability for hours at 95°C (compared with 30-45 minutes for t. aquaticus 
polymerase), thereby permitting a greater number of cycles at elevated temperature 
10 and, hence, enhanced sensitivity. 

Brief Description of the Figures 

The file of this patent contains at least one drawing executed in color. Copies 
15 of this patent with color drawing(s) will be provided by the Patent and Trademark 
Office upon request and payment of the necessary fee. 

FIGURE 1. A: The B-motif (region III) of polymerases from the Pol-I and 
Pol-cc families. With Pol-I the R, K, F and YG indicated in green, and the spacing 
between them, are highly conserved (8-10). The F, shown in green and underlined, is 
20 critical in discrimination between dNTPs and ddNTPs (4). In the case of Pol-ot the 
Q, K, N and YG shown in green, and their spacing, are conserved. It is difficult to 
deduce the optimal line up, between the two families, due to variation in both the 
conserved amino acids and their spacing (8-10). The alignment shown (others are 
possible by including gaps) has the critical F in Pol-I replaced by an N (underlined) 
25 in Pol-a. 

FIGURE 1. B: Structure of the B-motif of a Pol-I enzyme; the E. Coli 
Klenow fragment (11). The conserved amino acids (side chains shown in green) lie 
on one side of an a-helix and interact with dNTP (shown in red). F762 is near the 
sugar ring explaining its role in dNTP/ddNTP selection. 
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FIGURE 1. C: Structure of the B-motif of the Pol-cc enzyme from 
Thermococcus gorgonarius (28). Although the conserved amino acids (green) are on 
one side of an a-helix, as for Pol-I, the side chains, and their relative disposition, 
differ between the two families and are unlikely to interact with dNTPs identically. 
5 The a-helix of Tgo-Pol is also distorted by the presence of a short stretch of 3i 0 -helix 
between N491 and Y494, resulting in a further difference between the two families. 
All structures were generated using RasMol (35). 

FIGURE 2. A: The amino acids in, and surrounding, the B-region of 
Thermococcus gorgonarius (Tgo) and Pyrococcus furiosus (Pfu) polymerases. The 
10 amino acids shown in green are highly conserved and correspond to the amino acids 
also illustrated in green in Figure 1. Amino acids shown in magenta, when mutated 
in Pfu-Pol, give better sequencing performance. Note: Pfu-Pol contains a single 
amino acid insertion, upstream of this region, when compared to Tgo-Pol. This 
accounts for the numbering of corresponding amino acids differing by one. 
J 5 FIGURE 2. B: The structure formed by the Pfu-Pol amino acids shown in part 

A, which comprises an a-helix (the P-helix) flanked by loop regions (shown in blue). 
The conserved, green, amino acids (q484, K488, N492 and Y495) lie on one side of 
the helix and form part of the dNTP binding site. Three of the "improving", magenta, 
amino acids (A486, L490 and Y497) lie in the helical region but on the opposite side 
20 to the conserved amino acids. The fourth (Q472) is in a stretch of loop that precedes 
the a-helix. The side chains of these amino acids are shown in space-filling mode. 

FIGURE 2. C: End -on view of the P-helix region of Pfu-Pol clearly showing 
that the side chains (shown in space-filling mode) of the conserved (green) and 
"improving" (magenta) amino acids protrude on opposite sides of the helix. All 
25 structures were generated using RasMol (35). 

FIGURE 3 is a sequencing gel in which the dideoxynucleotide-triphosphate 
utilization of wild type DNA polymerase from Pyrococcus furiosus is compared 
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against DNA polymerase from Pyrococcus fuhosus mutants Q472H/N492H, A486Y, 
L490W, and Y497A. 

FIGURE 4 is the nucleotide sequence corresponding to Pfu pol exo". 

5 Detailed Description of Invention 

The following illustrates certain preferred embodiment of the invention, but is not 

intended to be illustrative of all embodiments, 

Methods and Materials 
1 0 Construction of a Pfu-Pol expression vector 

A gene coding for Pfu-Pol (Figure 4) was digested from the plasmid pTM121 

(prepared in house by Amersham Pharmacia Biotech Inc.) as a Nde\-Sma\ fragment 

and Iigated into Ndel-EcoRV cut pET-17b (Novagen) to give pET-17b(Pfu-Pol). 

This manipulation destroys the unique EcoRV restriction site present in pET-17b. 
15 The Pfu-Pol gene used contains the mutation D215A, which abolishes the 3 ? to 5' 

exonuclease activity of the enzyme. All experiments were performed with the exo" 

form of the enzyme. 

Expression and Purification of Pfu-Pol 

E. coli BL21(DE3) (Novagen) containing pET-17b(Pfu-Pol) was grown at 37°C until 
20 an A 60 o of 0.5 was reached. Protein expression was induced by adding IPTG, to a 
final concentration of 1 mM, and continuing growth for another 4 hours. Cells were 
harvested by centrifugation at 4°C (5,000 rpm for 20 minutes) and resuspended in 
lOmM Tris pH 8.0, lOOmM NaCl, ImM PMSF, ImM benzamidine. After sonication 
on ice (10 x 15s pulses) samples were centrifuged at 10,000 rpm for 20 minutes. The 
25 supernatant was incubated with approximately 20 units of DNase I (Boehringer- 
Mannheim) for 30 minutes to hydrolyse DNA. Next the supernatant was heated at 
75 °C for 20 minutes to denature most of the E. coli proteins and inactivate the DNase 
I. Precipitated proteins were removed by centrifugation (10,000 rpm for 20 minutes) 
and the supernatant was loaded onto a 20 ml DEAE-Sephacel column, equilibrated 
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and eluted with Tris pH 8.0, lOOmM NaCl. The flow through was collected and 
immediately applied to a 20 ml Heparin-Sepharose column, equilibrated with Tris 
pH 8.0, lOOmM NaCl. The column was developed with a 100 mM - 700 mM NaCl 
gradient in Tris pH 8.0. Fractions were analysed by SDS-PAGE using Coomassie- 
5 Blue staining and those containing a protein running at approximately 90 kD were 
pooled and concentrated using Centriprep 50 spin concentrators (Amicon). Protein 
samples were estimated to be >95% pure as judged by SDS-PAGE. All purified 
protein samples were stored at -20°C as 50% glycerol stocks. 
Random mutagenesis of Pfu-Pol and selection of mutants by colony screening 
10 Random mutagenesis was carried out on a section of the Pfu-Pol gene (in pET-17b) 
flanked by unique EcoKV and Sad restriction sites; a region comprising bases 1293- 
1596 and amino acid 431-532 (i.e. the P-helix and surrounding amino acids). A PCR- 
based method in conjunction with the mutagenic base analogues 8-oxo-dGTP and 
dPTP was used. The mutagenic primers used were GAACTATGATATCGCTCC 
15 (EcoKV primer) and CTTTTCTTCGAGCTCCTTCCATACT (Sad primer). Initially 
30 rounds of PCR were carried out under the following conditions: 1 0 mM Tris-HCl, 
pH 8.8, 50 mM KC1, 1.5 mM MgCl 2? 0.08% Nonidet P-40, 500 jiM each dNTP (four 
normal and two mutagenic), 0.5 units of T. aquaticus DNA polymerase; cycle 94 °C 
- 1 min., 55 °C - 3 min., 72 °C- 2 min. The amplified products were used in a second 
20 PCR reaction (conditions identical to the first but with only the four normal dNTPs at 
250 fiM) to generate a library of mutated DNA fragments. The library was digested 
with EcoKV and Sacl and ligated into pET-1 7b(Pfu-Pol) from which the EcoKV - 
Sacl fragment of the wild-type Pfu-Pol gene had been excised. The resulting 
plasmids were used to transform E. coli BL21(DE3) to ampicillin resistance. 
25 Transformants containing Pfu-Pol variants better able to incorporate ddNTPs were 
selected by modifying a "colony screening rapid filter assay" usually used to detect 
DNA polymerase activity. Bacterial colonies were gridded onto duplicate 
LB/ampicillin agar plates and allowed to grow overnight at 37 °C. The colonies from 
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one plate were replica-plated onto a nitrocellulose filter containing activated calf- 
thymus DNA, which was then overlaid onto an LB/ampicillin plate containing 1 mM 
IPTG. After a further 4 hours, at 37 °C, during which Pfu-Pol expression was 
induced, the nitrocellulose filters were removed and cells lysed using 
5 toluene/chloroform. The nitrocellulose filters were soaked in 20 mM Tris-HCl, pH 
8.5, 10 mM KC1, 20 mM (NH 4 )2S0 4 , 2 mM MgS0 4 , 0.1 mg/ml bovine serum 
albumin and 0.1 % Nonidet P-40 and baked at 70 Q C for 30 min. to destroy E. coli 
DNA polymerases. The filters were re-soaked with this buffer but also containing 12 
fiM each of dTTP, dCTP and dGTP plus 1 ^il of [a- 32 P]-ddATP (3000 Ci/mmol, 
10 Nycomed-Amersham) and then incubated at 70 °C for 30 min. to allow 32 P 
incorporation into polymeric material. The filters were washed with trichloroacetic 
acid and pyrophosphate, dried and any radioactivity retained on the filters (i.e. 
incorporated into polymeric material) determined using autoradiography (30). Under 
these conditions wild type Pfu-Pol results in minimal retention of 32 P on the filters. 
15 Clones that were associated with 32 P retention were rescued from the duplicate 
LB/ampicillin plate and used for both the preparation of mutant Pfu-Pol (exactly as 
for the wild type enzyme) and plasmid preparation for DNA sequencing. 
Site-directed mutagenesis of Pfu-Pol 

Site-directed mutagenesis of the Pfu-Pol gene was carried out using the PCR-based 
20 "overlap extension" method (31). Most of the directed mutants were made to the P- 
helix i.e. within the EcoRV/Sacl fragment described above. Therefore the EcoRV 
and Sac] oligonucleotides were used as the common outer primers, along with 
appropriate primers containing the required mutation, produce two overlapping DNA 
fragments. These two fragments, containing the desired mutation, were used as a 
25 template in a subsequent PCR reaction, together with the EcoRV and Sacl primers to 
generate the "full-length" mutated EcoRV /Sacl fragment. The protocol for PCR and 
the subsequent cloning of the mutated fragment were as described above. The amino 
acid Y410 is not in the P-helix and so does not lie between the EcoKV and Sacl 
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restriction sites. The mutations Y410F/A were produced analogously using an EcoRl 
flanking primer (GGGAAAGAATTCCTTCC) and the Sad primer. The Y410 codon 
is located between the unique £coRJ and Sad restriction sites. 
DNA sequencing using Pfu-Pol mutants 
5 DNA sequencing reactions were performed using cycle-sequencing. A kit purchased 
from Stratagene (Exo-"Cyclist" DNA sequencing kit) together with Ml 3 mpl8 
single stranded template DNA and a universal primer was used. Approximately 1 -2 
units of mutant polymerase was added per sequencing reaction, corresponding to the 
amount of wild type Pfu-Pol normally used. Reactions were initially performed at a 
30:1 ddNTP/dNTP ratio, the optimal nucleotide ratio for the wild type enzyme. This 
ratio was progressively lowered for mutant enzymes that showed an increased 
preference for dideoxynucleotides. 
Results 

Although the B-region in polymerases belonging to the Pol-I and Pol-a families are 
15 similar, rather than identical, it is clear that both play a role in dNTP binding. The 
amino acid sequence in the vicinity of the B-region for Pyrococcus furiosus 
polymerase (Pfu-Pol), the enzyme investigated in this study, is shown in Figure 2. 
For comparison the homologous sequence of the structurally characterised 
Thermococcus gorgonarius enzyme (Tgo-Pol) is also shown. With Tgo-Pol The B- 
20 motif consists of an a-helix (The P-helix) flanked by loop regions (Figure 2). Tgo- 
Pol and Pfu-Pol have 79 % sequence identity and variant amino acids invariably 
involve highly conservative changes. We have used Swiss-Model/SwissPdb viewer 
to deduce a structure for Pfu-Pol based on Tgo-Pol. As expected the two structures 
are almost identical and the derived P-helix from Pfu-Pol is shown in Figure 2. A 
25 variety of mutations has been made to these amino acids in Pfu-Pol (Table 1), using 
either random mutagenesis or back-to-back PCR site-directed mutagenesis (31). The 
mutants were expressed in E. coli, using pET-17b, and purified using a heat step, 
followed by two chromatography columns. All proteins appeared pure by heavily- 
loaded SDS-PAGE, stained with Coomassie-Blue. 
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The DNA-sequencing performance of the mutants was assessed using a 
standard cycle-sequencing protocol with the four [a- 32 P]-ddNTPs as chain 
terminators. In our hands wild-type Pfu-Pol gave good sequencing ladders (clearly 
defined bands, lack of non-specific termination, sequencing information spread over 
5 the entire size range of fragments) at a ddNTP:dNTP ratio of 30:1 (Figure 3). 
However, when a 5: 1 ratio was used much of the radioactivity was found towards the 
top of the gel (due to lower than optimal levels of termination giving rise to long 
products) and excessive non-specific termination was also seen, resulting in shadow 
bands across all four lanes. Both effects make it impossible to deduce the DNA 
10 sequences from the sequencing ladders. As improved DNA-sequencing performance 
correlates with increased selectivity for ddNTPs, we looked for mutants that give a 
readable sequencing ladder at ddNTPrdNTP ratios of < 30:1. Therefore, mutants 
were initially tested at the wild-type 30:1 ratio. Enzymes which gave readable 
sequencing gels or were associated with increased radioactivity towards the bottom 
15 of the gel (indicative of better ddNTP incorporation resulting in shorter products) 
were then evaluated at a 5:1 ratio of ddNTP:dNTP and, if warranted, at progressively 
decreasing ratios of the two triphosphates. 

Initially, the four most highly conserved amino acids (Q484, K488, N492, 
Y495), on the side of the P-helix that faces the ddNTP binding site (Figures 1 and 2), 
20 were investigated. Most of the mutants showed inferior sequencing ladders at the 
30:1 ddNTP :dNTP ratio used for initial screening, when compared to the wild type 
enzyme (Table 1). The few mutants that appeared equivalent to the wild type under 
these conditions all gave unreadable sequencing gels, characterised by radioactivity 
towards the top of the gel and non-specific termination, at a 5:1 ratio of 
25 trinucleotides. Both sequence and structural comparisons (Figure 1) show that the 
Pol -a family does not have an exact counterpart to the critical phe/tyr found with 
Pol-I enzymes. The best guess as to which amino acid, if any, would play this role is 
N492; a residue that is highly conserved (8, 9) and structurally at least in a similar 
location to the phe/tyr (Figure 1). However, N492Y (i.e. an attempt to introduce a tyr 
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near a location shown to give good sequencing performance with the Pol-I family) 
gave an inactive enzyme. Other mutations to this amino acid, e.g. N492H/K/G, while 
active resulted in difficult to read ladders at the initial 30:1 ddNTP:dNTP ratio. 
Changes to Q484, K488 and Y495 gave sequencing phenotypes at best equivalent to, 
5 but more commonly worse than, the wild type enzyme (Table 1 ). 

As shown in Figure 1, not only do the conserved amino acids on the same 
side of the O- or P-helix vary in the nature of their side chains but also in the spacing 
between them. The B-motif in the Pol-ot family is missing an amino acid when 
compared to Pol-I. Alternative alignments to that shown in Figure 1, in which a gap 
10 in Pol-ct is placed opposite the key phe/tyr in Pol-I, have been proposed (8, 9). 
Therefore, insertion mutations consisting of: addition of single tyrosine immediately 
before N492 (A(Y)N); replacement of several Pfu-Pol P-helix amino acids with the 
corresponding regions of the T. aquaticus O-helix (TIN(Y)GVL) (Table 1), have 
been prepared. In these Pfu-Pol variants the "missing" amino acid is replaced with a 
15 tyrosine either in the Pfu-Pol or the T. aquaticus polymerase context, allowing a 
more exact sequence alignment of the mutated B-motif with that of the Pol-I family. 
These insertion mutations also represent another approach to placing a tyrosine near 
the important phe/tyr in the pol-I family. Unfortunately, all insertion mutations were 
inactive (Table 1). 

-° Following alterations to the conserved amino acids and variation in their 

spacing a number of mutations have also been made in most of the other P-helix 
amino acids. As can be seen in Table 1 the majority of the changes lead to a sequence 
performance roughly equivalent to the wild type polymerase i.e. readable ladders at 
ddNTPrdNTP ratios of 30:1, no useful sequencing data, with most radioactivity at the 

!5 top of the gel and non-specific termination at the 5:1 ratio. Therefore, most changes 
offered no improvement in DNA sequencing performance over that of the wild type. 
Three mutations, A486Y, L490W and Y497A, were found which gave readable 
sequencing ladders at the 5:1 ratio. Progressively decreasing the amount of chain 
terminator showed that A486Y could be used at a ddNTPrdNTP ratio of 1 :5, L490W 
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at 1:3 and Y497A at 5:1; representing an improvement over wild type (in terms of 
ddNTP usage) of 1 50, 90 and 6 fold respectively (Table 1, Figure 3). Other mutations 
to these amino acids e.g. L490Y and Y497W also resulted in preferential usage of 
ddNTPs when compared to the wild type, although the ddNTP :dNTP ratios useable 
5 with these mutants were not as good as with L490W and Y497A respectively. We 
also found certain mutations to these amino acids e.g. A486W and Y497F (Table 1) 
that resulted in poorer ddNTP utilisation as compared to the wild type enzyme. 

As well as changing the amino acids in the P-helix in a directed manner, 
random mutagenesis has been used between amino acids 43 1 and 532. This stretch of 
10 101 amino acids encompasses the P-helix as well as amino acids flanking this 
structural element (Figure 2). Preparing large number of random mutants is quick 
and straightforward; screening for improved phenotype is often more tedious and 
difficult. We have adapted a "colony screening rapid filter assay", originally used to 
identify recombinants containing wild type T. aquaticus DNA polymerase, to directly 
15 assess improved incorporation of ddNTPs. Essentially E. coli colonies, on 
nitrocellulose filters containing activated calf-thymus DNA, are incubated with [a- 
32 P]-ddATP and dTTP/dCTP/d GTP . Colonies expressing a Pfu-Pol mutant that 
effectively uses ddNTPs transfer radioactivity to polymeric DNA immobilised on the 
nitrocellulose filter, allowing subsequent detection by autoradiography. The method 
20 includes a heat step, prior to the assay, to destroy host polymerases and so 
automatically scores for thermostability in the mutant Pfu-Pol. Only one mutant, that 
appeared to incorporate ddNTPs particularly well, was found and revealed by 
sequencing to be a double mutant, Q472H-N492H. This double mutant gave good 
sequencing data at a ddNTP :dNTP ratio of 5:1 (Table 1) i.e. a 5-fold improvement 
25 over wild type. However, the amount of ddNTP could not be further decreased and 
so the double mutant does not appear to be as good as either A486Y or L490W. The 
amino acid at 492, normally asparagine, is one of the conserved P-helix amino acids, 
described above (Figures 1 and 2). We therefore thought that the improvement in 
phenotype was due to the N492H change, a combination not tested above. 

12 



WO 01/38546 M A PCTAUS00/31830 



10 



Remarkably single directed mutants showed that the key change was to Q472R 
which gave good sequencing ladders, equivalent to those produced by the double 
mutant, at 5:1 ddNTPidNTP ratios. The single mutant N492H showed non-specific 
termination at the 5:1 ratio and in fact gave slightly inferior performance to the wild 
type when tested at 30:1 ddNTP:dNTP. The amino acid Q472 is located in a loop 
that connects the P-helix to another long a-helix (Figure 2) and its location is such 
that it is unlikely to interact directly with dNTPs. Not only does the mutation Q472H 
improve ddNTP incorporation but it fully rescues a mutant, N492H, with sequencing 
performance worse than the wild type. 

Changes have also been made to Y410, an amino acid that is not in the P- 
helix, but has been shown by crystallography to have an important role in binding 
dNTPs, forming the bottom of the nucleotide binding cleft. In our hands both Y410F 
and Y410A offered no improvement over the wild type (Table 1). 

Attempts to further reduce the amount of ddNTP needed for sequencing, by 
15 combining the improved mutants A486Y, L490W and Q472H-N492H were 
unsuccessful. One double mutant A486Y-L490W behaved in an identical manner to 
the single mutant A486Y i.e. gave readable sequencing gels at 1:5 ddNTP :dNTP 
(Table 1). The other two, A486Y-Q472H-N492H and L490W-Q472H-N492H, 
actually gave worse sequencing performance, characterised by unreadable gels at 
20 ddNTP :dNTP ratios of 1 :5 and 1 :3, than was seen with A486 and L490W alone. 

Thus, it has been seen that with the Pol-I enzymes a change of a single phe to 
a tyr converts ddNTPs from very poor to very good substrates (4). It was suggested 
that the 3'-OH of the natural dNTP substrate was one of the ligands for the essential 
M g 2 % required for polymerisation. With ddNTPs, which lack a 3'-OH, this 
interaction cannot take place, accounting for their poor substrate properties. Most 
Pol-I enzymes contain a phe, near the sugar of the dNTP, which is sometimes (either 
in a few natural Pol-I enzymes e.g. from T7 or by site-directed mutagenesis) replaced 
by tyr. 



25 
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Further, changes to most of the amino acids likely to interact directly with the 
dNTP do not improve discrimination for ddNTPs over dNTPs. This includes the 
conserved amino acids on one side of the P-helix and Y410 (Figures 1 and 2, Table 
1). None of the mutations improve ddNTP incorporation and sequencing 
5 performance (or else give low specific activities rendering the question of improved 
properties irrelevant) and an effect as profound as that caused by the single phe/try 
switch with the Pol-I family was never observed. Additionally, inserting a single 
amino acid into the P-helix of Pfii-Pol (to produce a better alignment with the O- 
helix of Pol-I) or helix swaps between the two categories gave inactive enzyme. This 
10 probably arises because of a large overall disruption to Pol-ot structure when its P- 
helix is replaced with the equivalent of an O-helix. Finally, changes to most of the 
other, less conserved, amino acids in the P-helix also lead to no improvement in 
selectivity for ddNTP and DNA sequencing performance (Table 1). 

Changes at three positions within the P-helix, amino acids A486, L490 and 
15 Y497 gives rise to a higher selectivity for ddNTPs. As shown in Figure 2, the side 
chains of these amino acids protrude from the P-helix on the side facing away from 
the DNA-binding cleft. The three amino acids cannot, therefore, interact directly with 
the dNTP, as postulated for the four highly conserved amino acids, Q484, K488, 
N492 nad Y495. Mutagenesis of Vent™ polymerase also concluded that changes to 
20 the equivalent of the Pfu-Pol A486 (A488 in Vent™) lead to an increase in the 
efficiency of ddNTP usage (33). This study changed A488 to several alternatives and 
it was observed that the bigger the side chain, the better the utilisation of ddNTP. The 
largest side chain used in this study was phe and this lead to a boost in the 
incorporation of ddNTPs, by a factor of about 15-fold, evaluated using sequencing 
25 gels. The change we made, A486Y, allows readable sequencing gels at a 
ddNTP: dNTP ratio of 1:5, a 150-fold improvement over the wild type. The 
determination of "sequencing performance" by visual inspection of gels is somewhat 
subjective; even under optimised conditions with the wild-type enzyme, non-specific 
termination is visible in a few places. Therefore, deciding the ddNTP:dNTP ratio 
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which gives clearly interpretable patterns i.e. exactly when non-specific termination 
begins to interfere with readability is difficult. However, we routinely use the Pfu-Pol 
A486Y mutant for sequencing purposes at the 1 :5 ddNTP:dNTP ratio and find it 
gives reliable sequencing information. What is clear, though, is that this amino acid 
5 represents a residue important for ddNTP usage, even though any improvement is 
far less than the phe/tyr switch in the pol-I family. L490 is one turn along the helix 
form A486 and the side chains of these two amino acids point in almost identical 
directions (Figure 2). This amino acid was not studied with Vent™ but we have 
found that the mutation L490W improves ddNTP usage, by a factor only slightly less 
10 than the A486Y change. L490Y, while slightly improved over wild type, is far less 
effective than L490W. This may, like the changes to the alanine described above, 
result from a correlation between side chain size and ddNTP utilisation. However, as 
there are not many larger side chains than the wild type leu, it is difficult to test this 
experimentally with a series of mutations as carried out with Vent™ polymerase and 
15 A488 (32). Two other P-helix amino acids, L479 and Y497, have side chains that 
point in almost the same direction as those of A486 and L490. Changes to L479 give 
a phenotype similar to wild type (Table 1). However, some changes to Y497 
(Y497W and Y497A) lead to a slightly improved preference for ddNTPs (Table 1), 
whereas others Y497F decrease ddNTP utilisation. A similar effect was seen with 
20 Vent™ with the corresponding amino acid Y499 (33). 

Not only are A488 and L490, the amino acids in the P-helix which when 
changed lead to the most pronounced preference for ddNTPs, unable to interact with 
dNTPs directly, but they are not highly conserved in the Pol-a family. Although the 
equivalent position to A486 is most usually ala, other amino acids e.g. asn, ser, ile, 
25 leu and phe are found (8, 9). Similarly L490 is commonly replaced by either another 
hydrophobic amino acid or thr/ser. This non-conservation emphasises that A486 and 
L490 are unlikely to have a direct critical function e.g. binding dNTP, 
primer/template or in catalysis. 
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The indirect action of the mutants A488Y and L490W was confirmed by the 
detection of a third mutant, associated with better ddNTP usage. Random 
mutagenesis to a region including both the P-helix and its flanking sequences 
revealed the double mutant Q472H-N492H. The improved properties do not result 
5 from the change to N492 (the conserved N in the P-helix, Figures 1 and 2) but, as 
shown by subsequent single mutants (Table 1), to alteration to Q472. The residue 
Q472 is in a loop that lies between the P-helix and a second long cx-helix. These two 
anti-parallel helices comprise the bulk of the fingers domain of the RB69 (27) and 
Tgo (28) polymerases and presumably also of the Pfu-Pol. Q472 is highly variable 
0 between Pol-a enzymes and even differs within the more closely related archaeal 
sub-set of this class (8, 9) e.g. this residue is ile with Tgo-Pol and Vent™ (Figure 2). 
Q472H, or the double mutant Q472H-N492H, only improve ddNTP incorporation, as 
assessed by sequencing gels, by a factor of 6-fold. 

It is apparent that many modifications and variations of the invention as 
> hereinabove set forth may be made without departing from the spirit and scope 
thereof. The specific embodiments described are given by way of example only, and 
the invention is limited only by the terms of the appended claims. 
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31. Table 1. DNA sequencing properties of Pfu-Pol mutants. 



Category of enzyme 


Mutant 


ddNTP:dNTP ratio 






optimized for DNA 






sequencing ladders 




Wild-type 


Wild-tvDe 


j\J. J 


Alterations to highly 


Q484A 


Worie than wilri tvi-vp 


conserved amino acids 


K488A 


Worst* than wilH-rvn*» 
»» viae iJtaii « nu" lyyjc 


In P-helix (shown in 


N492H/K/G 


\A/nrQ<* than w i 1 H-tv?-^ 
ty uj ow itidii wiiLi-iypc 


green in Figures ! and 


N492Y 


inactive iLii^ymc 


2) 


Y495/I/D/C/S 


oimiiar 10 wnu-iype 


Insertions into P-helix* 


A(Y)N and TIN(Y)GVL 


Inactive enzyme 


Alterations to other 


L479Y/W/P 


Similar to wild-type 


amino acids in 


A486Y 


1 .3 


P-heiix (see Figure 2) 


A486W 


Worse than wild-type 




L490W 


1:3 




L490Y 


5:1 




S493Y 


Similar to wild-type 




F494Y/C/S/T/V 


Similar to wild-type 




G496P/S/A 


Similar to wild-type 




Y497F 


Worse than wild-type 




Y497W 


10:1 




Y497A 


5:1 


Alterations to Y410 


Y410A/F 


Worse than wild-type 


Alterations to loop 


N492H 


5:1 



preceding the P-helix 
(see Figure 2) 

Multiple mutations 



Q472H-N492H 
A486Y-L490W 
Q472H-A486Y-N492H 
Q472H-L490W-N492H 



5:1 
1:5 

Worse than A486Y 
Worse than L490W 



The wild-type enzyme gives readable DNA. sequencing gels at a 30:1 
ddNTP:dNTP ratio, but unreadable gels at 5:1. Mutants described as 
similar to wild-type have this property; mutants noted as worse give 
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unreadable gels at the 30:1 ratio. For mutants with improved 
discrimination the number given is the lowest ratio of ddNTPrdNTP at 
which readable sequencing gels were obtained. Y410F/A indicates Y 
410 was changed to both F and A etc. *A(Y)N has a Y inserted 
5 between A491 and N492. In TIN(Y)GVL this sequence replaces the 
amino acids between 489 and 494 (LLANSF). 
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What is claimed is: 

5 1 . A purified recombinant thermostable DNA polymerase comprising the amino 

acid sequence which corresponds to the nucleotide sequence set forth in Figure 4 
further modified to contain one or more amino acid changes selected from the 
group consisting of Q472H/N492H, A486Y, L490W, L490Y, Y497A, Y497W, 
N492H, Q472H/N492H, and A486Y/L490Y. 

10 2. An isolated nucleic acid that encodes a thermostable DNA polymerase having the 
peptide sequence of Claim 1 . 

3. A DNA polymerase encoded for by the nucleic acid of Claim 2. 

4. A recombinant DNA vector that comprises the nucleic acid of Claim 2. 

5. The recombinant vector of Claim 3 containing the plasmid pTM12 1 . 
15 6. A recombinant host cell transformed with the vector of Claim 4. 

7. The recombinant host cell of Claim 5, wherein the cell is E. coli. 

8. In a method of sequencing DNA which comprises the step of generating chain 
terminated fragments from a DNA template to be sequenced with a DNA 
polymerase, the improvement comprising utilizing, as said DNA polymerase, the 

>0 DNA polymerase of Claim 2 in the presence of at least one chain terminating 

agent and one or more nucleotide triphosphates, and determining the sequence of 
said DNA from the sizes of said fragments. 

9. A kit for sequencing DNA comprising the DNA polymerase of of Claims 2 . 
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FIG. 4 

Pft/ pol exo" DNA sequence 

SI^Iin AGATGTGGATTAC AT ^ CTGAA GAAGGAAAACCTGTTATTAGGCTATTC AAAAAAG AGAACG Base pairs 
TACTAAAATCTACACCTAATGTATTGACTTCTTCCTTTTGGAC AATAATCCGATAAGTTTTTTCTCTTGC i - 7 0 

^^^J^^ 7A GAGCATGATAGAACTTTTAGACCATACATTTACGC7CTTCTCAGGGATGATTCAAA Base pairs 

CTTTTrtAA i TCTA7CTCGTACTATCTTGAAAATCTGGTATGTAAATGCGAGAAGAGTCCC TACTAAGTTT 71-14 0 

GATTGAAGAAGTTAAGAAAATAACGGGGGAAAGGCATGGAAAGATTGTGAGAATTGTTGATGTAGAGAAG Base oairs 

C i AACTT^TTC AATTCTTTTATTGCCCCCTTTCCGTACCTTTCTAACACTCTTAAC AACTAC ATCTCTTC 141-210 

G J* GAGAAAAAGT ^ Base pairs 

CAACTCTTTTTCAAAGAGCCGTTCGGATAATGGCACACCTTTGAAATAAACCTTGTAGGGGTTCTACAAG 211-280 

CCACTATTAGAGAAAAAGTTAGAGAACATCCAGCAGTTGTGGACATCTTCGAATACGATATTCCATTTGC Base pairs 

GGTGATAATCTCTTTTTCAATCTCTTGTAGGTCGTCAACACCTGTAGAAGCTTATGCTATAAGGTAAACG 281-350 

AAAGAGATACCTCATCGACAAAGGCCTAATACCAATGGAGGGGGAAGAAGAGCTAAAGAT7CTTGCCTTC Base pairs 

A ^TATGGAGTAGCTGTTTCCGGATTATGGTTACCTCCCCCTTCTTCTCGATTTCTAAGAACGGAAG 351-420 

GATATAGAAACCCTCTATCACGAAGGAGAAGAGTTTGGAAAAGGCCCAATTATAATGATTAGTTATGCAG Base pairs 

CTATATCTTTGGGAGATAGTGCTTCCTCTTCTCAAACC7TTTCCGGGTTAATATTACTAATCAATACGTC 4 21-490 

A7GAAAA7GAAGCAAAGG7GA77AC77GGAAAAACA7AGA7C77CCA7ACG77GAGG77G7A7CAAGCGA Base pairs 

7AC7777ACT7CG777CCAC7AA7GAACC77777G7ATC7AGAAGG7A7GCAAC7CCAACA7AGTTCGCT 4 91 -560 

GAGAGAGA7GA7AAAGAGAT77C7CAGGA77A7CAGGGAGAAGGA7CC7GACA77A7AG77AC77A7AA7 Base pairs 

C7C7C7C7AC7A777CTC7AAAGAG7CC7AA7AG7CCC7C77CC7AGGAC7G7AA7A7CAA7GAATA77A 561-630 

GGAGAC7CAT7CGCA77CCCA7A777AGCGAAAAGGGCAGAAAAAC77GGGA77AAAT7AACCA77GGAA Base pairs 

CC7C7GAG7AAGCG7AAGGG7A7AAA7CGC7777CCCG7C77777GAACCC7AA777AA77GG7AACC77 631-700 

GAGA T GGAAGCGAGCCC ^ Base pairs 

C7C7ACC77CGC7CGGG77C7ACG7C7C77A7CCGC7A7AC7GCCGACA7C77CAG7TCCC77C7TA7G7 701-770 

TTF GGAG T? GTATCATG ^ Base pairs 

AAAGC7GAACA7AG7AC A77A77G77CCTG77A777AGAGGG77G7A7GTG7GATC7CCGACA7ATACT7 7 71-840 

^^iriI TGGAAAGCCAAAGGAGAAGGTATACGCC Base pairs 

CG77AAAAACC777CGG777CC7C77CCA7A7GCGGC7GC7C7A7CG7777CGGACCC777CACCTC7C7 841-910 

ACC77GAGAGAG77GCCAAA7AC7CGA7GGAAGA7GCAAAGGCAAC77A7GAAC7CGGGAAAGAA77CC7 Base pairs 

7GGAAC7C7C7C AACGG777A7GAGC7ACC77C7ACG777CCG77GAA7AC77GAGCCC777C77AAGGA 911-980 

TCCAA7GGAAA77CAGC777CAAGA77AG77GGACAACC777ATGGGA7G777CAAGG7CAAGCACAGGG Base pairs 

AGG7 TACC 7T7 AAGTCGAAAG77C7AA7CAACC7G77GGAAA7ACCC7 AC AAAG77CC AG77CG7G7CCC 981-1050 

AACC77G7AGAG7GG77C77AC77AGGAAAGCCTACGAAAGAAACGAAG7AGC7CCAAACAAGCCAAGTG Base pairs 

TTGGAAC A7C7CACC AAGAA7GAA7CC7T7CGGATGC777C777GC77CA7CGAGG77TG77CGGT7CAC 1051-1120 

AAGAGGAGTATCAAAGAAGGCTCAGGGAGAGCTACACAGGTGGATTCG7TAAAGAGCCAGAAAAGGGGTT Base paii s 

77CTCC7C A7 AG777C77CCGAG7CCC7C7CGA7G7G7CC ACC7AAGCAA777C7CGG7C7777CCCC AA 1121-1190 

G7GGGAAAACA7AG7ATACC7AGA7777AGAGCCC7A7A7CCCTCGA77ATAA77ACCCACAA7GTT7C7 Base pairs 

C ACCC7777GT ATC A7A7GGA7C7AAAA7C7CGGGA7A7AGGGAGC7AA7AT7AA7GGG7G77ACAAAGA 1191-1260 

CCCGA7ACTCTAAA7C77GAGGGA7GCAAGAAC7A7GA7A7CGC7CC7CAAG7AGGCCACAAG77C7GCA Base pairs 

GGGC7ATGAGA77TAGAAC7CCC7ACGT7C77GA7AC7ATAGCGAGGAG77CA7CCGG7G77CAAGACGT 12 61-1330 

AGGACATCCCTGGTTTTATACCAAGTCTCTTGGGACA7TTGTTAGAGGAAAGACAAAAGA7TAAGACAAA 3ase oairs 

TCC7G7AGGGACCAAAA7A7GG77CAGAGAACCC7G7AAACAA7C7CCT77C7G7777C7AA77CTGTT7 1331-1400 

AA7GAAGGAAAC7CAAGA7CC7A7AGAAAAAA7AC7CC77GAC7A7AGACAAAAAGCGA7AAAAC7CT7A Base pairs 

T7ACT7CC77TGAG77C7AGGA7A7C777777A7GAGGAAC7GATA7C7G77777CGCTA7777GAGAA7 1401-1470 

GCAAA77C7T7C7ACGGA7A77A7GGC7A7GCAAAAGCAAGA7GG7AC7GTAAGGAG7G7GC7GAGAGCG Base pairs 
CG7TTAAGAAAGATGCC7A7AA7ACCGA7ACG7777CG77C7ACCA7GACA77CC7CACACGAC7C7CGC 
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FIG. 4 Continued 

TTACTGCCTGGGGAAGAAAGTACATCGAGTTAGTATGGAAGGAGCTCGAAGAAAAGTTTGGATTTAAAGT Base pairs 

AATGACGGACCCCTTCTTTCATGTAGCTCAATCATACCTTCCTCGAGCTTCTTTTCAAACCTAAATTTCA 1541-1610 

CCTCTACATTGACACTGATGGTCTCTATGCAACTATCCCAGGAGGAGAAAGTGAGGAAATAAAGAAAAAG Base pairs 

GGAGATGTAACTGTGACTACCAGAGATACGTTGATAGGGTCCTCCTC7TTCACTCCTTTATTTCTTTTTC 1611-1680 

GCTCTAGAATTTGTAAAATACATAAATTCAAAGCTCCCTGGACTGCTAGAGCTTGAATATGAAGGGTTTT Base pairs 

CG AGATCTT AAAC ATTTTATGTATTTAAGTTTCGAGGGACCTGACGATCTCGAACTTATACTTCCC AAAA 1681-1750 

ATAAGAGGGGATTCTTCGTTACGAAGAAGAGGTATGCAGTAATAGATGAAGAAGGAAAAGTCATTACTCG Base pairs 

TATTCTCCCCTAAGAAGC AATGCTTCTTCTCCATACGTCATTATCTACTTCTTCCTTTTC AGTAATGAGC 17 51-182 0 

TGGTTTAGAGATAGTTAGGAGAGATTGGAGTGAAATTGCAAAAGAAACTCAAGCTAGAGTTTTGGAGACA Base pairs 

ACCAAATCTCTATCAATCCTCTCTAACC7CACTTTAACGTTTTCTTTGAGTTCG ATCTC AAAACCTCTGT 1821-1890 

ATACTAAAACACGGAGATGTTGAAGAAGCTGTGAGAATAGTAAAAGAAGTAATACAAAAGCTTGCCAATT Base pairs 

TATGATTTTGTGCCTCTACAACTTCTTCGACACTCTTATCATTTTCTTCATTATGTTTTCGAACGGTTAA 1891-1960 

ATGAAATTCCACCAGAGAAGCTCGCAATATATGAGCAGATAACAAGACCATTACATGAGTATAAGGCGAT Base pairs 

TACTTTAAGGTGGTCTCTTCGAGCGTTATATACTCGTCTATTGTTCTGGTAATGTACTCATATTCCGCTA 1961-2030 

AGGTCCTCACGTAGCTGTTGCAAAGAAACTAGCTGCTAAAGGAGTTAAAATAAAGCCAGGAATGGTAATT Base pairs 

TCC AGGAGTGC ATCGAC AACGTTTCTTTGATCGACGATTTCCTC AATTTTATTTCGGTCCTTACCATTAA 2031-2100 

GGATACATAGTACTTAGAGGCGATGGTCCAATTAGCAATAGGGCAATTCTAGCTGAGGAATACGATCCCA Base pairs 

CCT ATGTATC ATGAATCTCCGCT ACCAGGTTAATCGT7ATCCCGTTAAGATCGACTCCTTATGCT AGGGT 2101-2170 

AAAAGCACAAGTATGACGCAGAATATTACATTGAGAACCAGGTTCTTCCAGCGGTACTTAGGATATTGGA Base pairs 

TTTTCGTGTTCATACTGCGTCTTATAATGTAACTCTTGGTCCAAGAAGGTCGCCATGAATCCTATAACCT 2171-2240 

GGGATTTGGATACAGAAAGGAAGACCTCAGATACCAAAAGACAAGACAAGTCGGCCTAACTTCCTGGCTT Base pairs 

CCCTAAACCT ATGTCTTTCCTTCTGGAGTCT ATGGTTTTCTGTTCTGTTC AGCCGGATTGAAGG ACCGAA 2241-2310 

AACATTAAAAAATCCTAG Base pairs 
TTGTAATTTTTTAGGATC 2311-2 32 8 
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